Skip to content

feat(mergeconflict): make merge-conflict check asynchronous via runway#245

Draft
behinddwalls wants to merge 1 commit into
preetam/runway-contractfrom
preetam/mergeconflict-async
Draft

feat(mergeconflict): make merge-conflict check asynchronous via runway#245
behinddwalls wants to merge 1 commit into
preetam/runway-contractfrom
preetam/mergeconflict-async

Conversation

@behinddwalls

@behinddwalls behinddwalls commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Summary

Why?

The merge-conflict check ran synchronously inside the validate consumer by calling the mergechecker extension inline. A real merge attempt is slow and I/O-heavy, so doing it on the partition lease blocks the pipeline and couples SubmitQueue to the checker's latency. This moves the check to an asynchronous round-trip with runway, modelled on build/buildsignal but across a service boundary.

What?

The pipeline gains validate → mergeconflict ⇢ (runway) ⇢ mergeconflictsignal → batch:

  • validate drops the inline mergechecker call and now publishes the request id to the internal mergeconflict topic (dedup + change-metadata + claim are unchanged).
  • mergeconflict (new) publishes the full MergeConflictCheckRequest to the runway-owned merge-conflict-checker queue, keyed by the request id as the client-owned correlation id. No local record is needed — the request id round-trips, so the result correlates straight back to the request (unlike build, whose server-generated id needs a mapping store).
  • mergeconflictsignal (new) consumes runway's MergeConflictCheckResult off merge-conflict-checker-signal, advances the request to batch when mergeable, or fails it (user error) when conflicted.
  • DLQ reconcilers drive the request to Error on dead-letter; the signal DLQ reads the request id straight off the result.

Crossing the runway boundary is why these payloads carry full data rather than entity IDs; the new queue-payload-boundary rule is documented in CLAUDE.md, with the pipeline diagram and stage table updated in workflow.md and the superseded mergechecker validate-path row noted in extension-contract.md. The mergechecker package is left in-tree (unused on the validate path); removing it is a follow-up. Runway's service implementation is out of scope — only its contract (added in the parent PR) is consumed here.

Test Plan

bazel build //...
bazel test //... --test_tag_filters=-integration,-e2e (54 tests pass)
make gazelle clean

Stack

  1. refactor(entity): promote RequestLandStrategy to shared entity/mergestrategy #243
  2. feat(runway): add merge-conflict check wire contract and topic keys #244
  3. @ feat(mergeconflict): make merge-conflict check asynchronous via runway #245


**Outputs are unchanged except `pusher`.** This RFC moves the *input* toward identity; five of the six return contracts — conflicts, score, mergeability, change info, build id/status — are exactly what they are today. `pusher` is the lone exception: because its input becomes a *list* of independently-landed batches, its result regroups per batch (`BatchID`-tagged, per-change commit detail kept underneath) so each batch's outcome stays correlatable — the "output mirrors the input unit" principle above. No other output shape changes.

> **Note (superseded for the validate path).** The `mergechecker.MergeChecker` row above describes the *synchronous*, in-process mergeability check `validate` used to run. That check is now **asynchronous and out-of-process**: `validate` hands off to the `mergeconflict` controller, which publishes a full check request to the runway-owned `merge-conflict-checker` queue; runway performs the merge attempt and returns a result that `mergeconflictsignal` consumes (see [workflow.md](workflow.md)). The `mergechecker` extension package is kept in-tree but unused on the validate path; removing it is a follow-up. The row is retained here as the historical contract.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets update the RFC to reflect the new contract rather than adding a note here

Comment on lines +30 to +34
// TopicKeyMergeConflict is the internal pipeline stage where validated
// requests are published to start an asynchronous merge-conflict check. The
// mergeconflict controller consumes it, records the check, and publishes the
// full request to runway's merge-conflict-checker queue.
TopicKeyMergeConflict TopicKey = "mergeconflict"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

records the check? we removed storage layer so no recording anymore, request itself captures the state for it.

Comment on lines +122 to +124
// User error: reject to DLQ, where the request is reconciled to Error.
return errs.NewUserError(fmt.Errorf("request %s is not mergeable: %s", request.ID, result.Reason))
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why reject to DLQ when we know the request has failed and we should just mark it failed?

## Summary

### Why?

The merge-conflict check ran synchronously inside the `validate` consumer by calling the `mergechecker` extension inline. A real merge attempt is slow and I/O-heavy, so doing it on the partition lease blocks the pipeline and couples SubmitQueue to the checker's latency. This moves the check to an asynchronous round-trip with runway, modelled on `build`/`buildsignal` but across a service boundary.

### What?

The pipeline gains `validate → mergeconflict ⇢ (runway) ⇢ mergeconflictsignal → batch`:

- `validate` drops the inline `mergechecker` call and now publishes the request id to the internal `mergeconflict` topic (dedup + change-metadata + claim are unchanged).
- `mergeconflict` (new) publishes the full `MergeConflictCheckRequest` to the runway-owned `merge-conflict-checker` queue, keyed by the request id as the client-owned correlation id. No local record is needed — the request id round-trips, so the result correlates straight back to the request (unlike `build`, whose server-generated id needs a mapping store).
- `mergeconflictsignal` (new) consumes runway's `MergeConflictCheckResult` off `merge-conflict-checker-signal`, advances the request to `batch` when mergeable, or fails it (user error) when conflicted.
- DLQ reconcilers drive the request to `Error` on dead-letter; the signal DLQ reads the request id straight off the result.

Crossing the runway boundary is why these payloads carry full data rather than entity IDs; the new queue-payload-boundary rule is documented in CLAUDE.md, with the pipeline diagram and stage table updated in workflow.md and the superseded `mergechecker` validate-path row noted in extension-contract.md. The `mergechecker` package is left in-tree (unused on the validate path); removing it is a follow-up. Runway's service implementation is out of scope — only its contract (added in the parent PR) is consumed here.

## Test Plan

✅ `bazel build //...`
✅ `bazel test //... --test_tag_filters=-integration,-e2e` (54 tests pass)
✅ `make gazelle` clean
@behinddwalls behinddwalls force-pushed the preetam/mergeconflict-async branch from a51a86b to d565b91 Compare June 16, 2026 03:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant